The simplest maximum entropy model for collective behavior in a neural network 
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Recent work emphasizes that the maximum entropy principle provides a bridge between statistical 
mechanics models for collective behavior in neural networks and experiments on networks of real 
neurons. Most of this work has focused on capturing the measured correlations among pairs of neu- 
rons. Here we suggest an alternative, constructing models that are consistent with the distribution 
of global network activity, i.e. the probability that K out of N cells in the network generate action 
potentials in the same small time bin. The inverse problem that we need to solve in constructing 
the model is analytically tractable, and provides a natural "thermodynamics" for the network in the 
limit of large N. We analyze the responses of neurons in a small patch of the retina to naturalistic 
stimuli, and find that the implied thermodynamics is very close to an unusual critical point, in which 
the entropy (in proper units) is exactly equal to the energy. 
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Many of the most interesting phenomena of life are 
collective, emerging from interactions among many ele- 
ments, and physicists have long hoped that these col- 
lective biological phenomena could be described within 
the framework of statistical mechanics. One approach to 
a statistical mechanics of biological systems is exempli- 
fied by Hopfield's discussion of neural networks, in which 
simplifying assumptions about the underlying dynamics 
lead to an effective "energy landscape" on the space of 
network states [T]-[3]. In a similar spirit, Toner and Tu 
showed that simple stochastic dynamical models for co- 
ordinating the motion of moving organisms, as in flocks 
of birds or schools of fish, can be mapped to an effective 
field theory in the hydrodynamic limit [H [5] . 

A very different way of constructing a statistical me- 
chanics for real biological systems is through the maxi- 
mum entropy principle [6 . Rather than making specific 
assumptions about the underlying dynamics, we take a 
relatively small set of measurements on the system as 
given, and build a model for the distribution over system 
states that is consistent with these experimental results 
but otherwise has as little structure as possible. This 
automatically generates a Boltzmann-like distribution, 
defining an energy landscape over the states of the sys- 
tem; importantly, this energy function has no free param- 
eters, but is completely determined by the experimen- 
tal measurements. As an example, if we look in small 
windows of time where each neuron in a network either 
generates an action potential (spike) or remains silent, 
then the maximum entropy distribution consistent with 
the mean probability of spiking in each neuron and the 
correlations among spikes in pairs of neurons is exactly 
an Ising spin glass [7J. Similarly, the maximum entropy 



model consistent with the average correlations between 
the flight direction of a single bird and its immediate 
neighbors in a flock is a Heisenberg model [5]. Starting 
with the initial work on the use of pairwise maximum en- 
tropy models to describe small (N = 10 — 15) networks of 
neurons in the retina, this approach has been used to de- 
scribe the activity in a variety of neural networks [51116) , 
the structure and activity of biochemical and genetic net- 
works [ITJUH], the statistics of amino acid substitutions 
in protein families [TM2"Tj] . and the rules of spelling in 
English words [55]. Here we return to the retina, taking 
advantage of new electrode arrays that make it possible 
to record from a large fraction of the ~ 200 output neu- 
rons within a small, highly interconnected patch of the 
circuitry [27] . Our goal is not to give a precise model, 
but rather to construct the simplest model that gives us 
a glimpse of the collective behavior in this system. For a 
different approach to simplification, see Ref (28 . 

The maximum entropy approach is much more general 
than the construction of models based on pairwise corre- 
lations. To be concrete, we consider small slices of time 
during which each neuron in our network either generates 
an action potential or remains silent. Then the states of 
individual neurons are defined by a\ = 1 when neuron i 
generates a spike, and cr; = — 1 when neuron i is silent. 
States of the entire network are defined by a = {crj}, 
and we are interested in the probability distribution of 
these states, P(a). If we know the average values of some 
functions / M (f?), then the maximum entropy distribution 
consistent with this knowledge is 
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FIG. 1: Experimental results for Pn{K), in groups of N = 40 
neurons. At left, solid points show the distribution estimated 
by averaging over many randomly chosen groups of N — 40 
cells out of the N = 160 in our data set; error bars are stan- 
dard deviations across random halves of the duration of the 
experiment. Open circles are the expectation if cells are inde- 
pendent. At right, the distribution of correlation coefficients 
among pairs of neurons in out sample. Because the experi- 
ment is long, the threshold for statistical significance of the 
correlations is very low, |Cthrcsh| < 0.01. Almost all pairs of 
cells thus have significant correlations, but these correlations 
are weak. 



where the couplings gy, have to be adjusted to match the 
measured expectation values (/ M (<?)). 

In any given slice of time, we will find that K out of 
the N neurons generate spikes, where 
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One of the basic characteristics of a network is the distri- 
bution of this global activity, Pn(K). As an example, in 
Fig 1 we show experimental results on P^{K) for groups 
of N — 40 neurons in the retina as it views a naturalistic 
movie. In these experiments |27j . we use a dense array 
of electrodes that samples 160 out of the ~ 200 ganglion 
cells in a small patch of the salamander retina, and we 
divide time into bins of At = 20 ms. The figure shows 
the average behavior in groups of N = 40 cells chosen 
out of this network, under conditions where a naturalis- 
tic movie is projected onto the retina. The correlations 
between pairs of cells are small, but Pjv (AT) departs dra- 
matically from what would be expected if the neurons 
generated spikes independently. 

How do we construct the maximum entropy model con- 
sistent with the measured P/v(AT)? Knowing the distri- 
bution Pjy (if) is equivalent to knowing all its moments, 
so the functions whose expectation values we have 

measured are fi(a) = K, ^(f?) = K 2 , and so on. Thus 
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where Vn(K) is some effective potential that we need to 
choose so that Pjv (AT) comes out equal to the experimen- 
tally measured P™ P (AT). 

Usually the inverse problem for these maximum en- 
tropy distributions is hard. Here it is much easier. We 
note that 
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The log of this number is an entropy at fixed K, Sn{K) = 
hxAf(K, N), so we can write 

P N (K) = exp [S N (K)-V N (K)} . (7) 

Finally, to match the distribution P/v(AT) to the experi- 
mental measurement P^ xp (AT), we must have 



V N (K) = - \nP c * p (K) + S N (K) + \uZ_ 
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In Fig 2 we show the average results for Vjy(K) in net- 
works of size N — 40. 

We expect that both energy and entropy will be exten- 
sive quantities. For the entropy Sn(K) this is guaran- 
teed by Eq which tells us that as becomes large, 
Sn(K) — > Ns{K/N). It is an experimental question 
whether, in the networks we are studying, there is some- 
thing analogous to a thermodynamic limit in which, for 
large N, we have V N {K) -> Ne(K/N). This is illustrated 
at right in Fig 2, where for K/N = 0.05 we study the de- 
pendence of the energy per neuron on 1 /N. There is a 
natural extrapolation to large N, and this is true for all 
the ratios of K/N that we tested. 

In the — > oo limit, the natural quantities are the en- 
ergy and entropy per neuron, e and s, respectively, and 
these are shown in Fig 3. One clear result is that, as we 
look at more and more neurons in the same patch of the 
retina, we do see the emergence of a well defined, smooth 
relationship between entropy and energy s(e). While 
most neural network models are constructed so that this 
thermodynamic limit exists, it is not so obvious that this 
should happen in real data. In particular, if we consider 
a family of models with varying A^ in which all pairs of 
neurons are coupled, the standard way of arriving at a 
thermodynamic limit is to scale the coupling strengths 
with N, and correspondingly the pairwise correlations 
are expected to vary with N. In constructing maximum 
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FIG. 2: The effective potential and its dependence on sys- 
tem size. At left, results for N = 40 neurons, showing both 
the potential Vjv (K) (points with error bars) and the entropy 
Sn{K) (smooth curve); error bars are as in Fig 1. At right, 
the behavior of Vn(K = aN)/N, for a = 0.05, showing the 
dependence on N (points with error bars) and the extrapola- 
tion N — > oo (square). 

entropy models, we can't follow this path, since the cor- 
relations are measured and thus by definition don't vary 
as we include more and more neurons. Here we focus not 
on correlations but on the distribution P/v(_fT), and thus 
the emergence of a thermodynamic limit depends on the 
evolution of this distribution with N. 

We recall that the plot of entropy vs. energy tells 
us everything about the thermodynamics of the system. 
In our maximum entropy construction, there is no real 
temperature — fcgT just provides units for the effective 
energy Vn (K) . But if we have a model for the energy as 
a function of the microscopic state of the system, then we 
can take this seriously as a statistical mechanics problem 
and imagine varying the temperature. More precisely, we 
can generalize Eq ([3| to consider 

^^^^ (9) 

where the real system is at j3 — 1. Then in the ther- 
modynamic limit we have the usual identities: the tem- 
perature is defined by ds/de — j3, the specific heat is 
C = hs (3 2 (—d 2 s / de 2 )^ 1 , and so on. In particular, the 
vanishing of the second derivative of the entropy implies 
a diverging specific heat, a signature of a critical point. 

In our case, since the real system is at j3 = 1, the 
behavior of the network will be dominated by states with 
an energy per neuron that solves the equation ds/de = 1. 
But Fig 3 shows us that, as we consider more and more 
neurons, the function s(e) seems to be approaching s = 
Pqc, where /3q = 0.999 ± 0.004 is one within errors. If wc 
had exactly s = e, then all energies would be solutions of 
the condition ds/de — 1. Correspondingly, the specific 
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FIG. 3: Entropy vs. energy. We compute the effective energy 
per neuron, e = Vn(K)/N, averaged over multiple groups of 
N neurons chosen out of the 160 we have access to in the 
experiment, and then compare with the entropy per neuron, 
s = Sn(K)/N . The extrapolation is as in Fig 2, and error 
bars in energy (visible only when larger than symbols) are as 
in Fig. 1. 

heat C would diverge, signaling that /3 = 1 is a critical 
point. This is a very unusual critical point, since all 
higher derivatives of the entropy vanish |30) . 

More generally, when we try to describe the probabil- 
ity distribution over states a using ideas from statistical 
mechanics, we are free to choose the zero of the (effective) 
energy as we wish. A convenient choice is that the unique 
state of zero spikes — complete silence in the network — 
should have zero energy. Unless there are exponentially 
many states that with probability equal to the silent state 
(which seems unlikely), in the large N limit the entropy 
per neuron will also be zero at zero energy. But with this 
choice for the zero of energy, the probability of the silent 
state is given by P s iiencc = an d Z = e~ F , where F 
is the effective free energy, since we are at /3 = 1. Thus 
if we can measure this probability reliably, we can "mea- 
sure" the free energy, without any further assumptions. 
We see in Fig 4 that the probability of silence falls as we 
look at more and more neurons, which makes sense since 
the free energy should grow with system size. But the 
decline in the probability of silence is surprisingly slow. 
We can make this more precise by computing the effec- 
tive free energy per neuron, / = F/N, also shown. This 
is a very small number indeed, / ~ —0.01 at the largest 
values of N = 160 for which we have data. 

We recall that, with fc^T = 1, the free energy per 
neuron is / = (e) — Stotai, where (e) denotes the average 
energy and s to tai is the total entropy of the system, again 
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FIG. 4: The probability of silence, and the effective free en- 
ergy. At left, the probability that a network of N neurons is 
in the silent state, where none of the cells generate a spike 
within a window At; error bars as in Fig 1. Note that this 
probability declines very slowly with the number of neurons 
N. At right, we translate the probability of silence into an ef- 
fective free energy per neuron, and see that this varies linearly 
with 1/N, the extrapolation TV — > oo (square). 



normalized per neuron. Our best estimate of the entropy 
of the states taken on by the network is s to tai ^0.2 per 
neuron, which means that the free energy reflects a can- 
cellation between energy and entropy with a precision 
of at least ~ 5%. If we extrapolate to the thermody- 
namic limit the cancellation becomes even more precise, 
so that the extensive component of the free energy is 
/oo = -0.0051 ± 0.00003 (Fig 4). Notice that the small 
value of the free energy means that the silent state occurs 
frequently, and hence we can measure its probability very 
accurately, so the error bars are small. If we had a criti- 
cal system in which s(e) = e, the extensive component of 
the free energy would be exactly zero. 

In a normal thermodynamic limit (and (3 = 1), /oo = 
e* — s(e*), where e* is the energy at which ds/de = 1. 
Geometrically, /oo is the intercept along the energy axis 
of a line with unit slope that is tangent to the curve 
s(e) at the point e*. From above we have s(0) = 0, and 
then if s(e) is concave (d 2 s(e)/de 2 < 0, so that the spe- 



cific heat is everywhere positive) we are guaranteed that 
/oo is negative. But to have /oo —> then requires that 
ds(e)/de < 1 at e = 0. In this scenario, pushing /oo to- 
ward zero requires both e* and s(e*) to approach zero, so 
that the network is in a (near) zero entropy state despite 
the finite temperature. This state would be similar to 
the critical point in the random energy model |29) . but 
this seems inconsistent with the evidence for a nonzero 
entropy per neuron. 

To have near zero free energy with nonzero entropy 
seems to require something very special. One possibility 
is to allow d 2 s(e)/de 2 > 0, allowing phase coexistence 
between the e = silent state and some other e ^ state. 
The other possibility is to have s(e) = e, as suggested by 
Fig 3. Thus, while the observation of a nearly zero free 
energy per neuron does not prove that the entropy is 
equal to the energy for all energies, it does tell us that 
the network is in or near one of a handful of unusual 
collective states. 

The model we have considered here of course throws 
away many things: we are not keeping track of the iden- 
tities of the cells, but rather trying to capture the global 
activity of the network. On the other hand, because 
we are considering a maximum entropy model, we know 
that what we are constructing is the least structured 
model that is consistent with P^(K). It thus is sur- 
prising that this minimal model is so singular. As we 
have emphasized, even without appealing to a model, we 
know that there is something special about these net- 
works of neurons because they exhibit an almost perfect 
cancellation of energy and entropy. The more detailed 
maximum entropy analysis suggests that cancellation is 
not just true on average, but rather that the entropy is 
almost precisely equal to the energy as a function. This 
is consistent with hints of criticality in previous analyses, 
which extrapolated from much smaller groups of neurons 
[El [13l [30], although much more remains to be done. 
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